arXiv:1508.06467vl [physics.soc-ph] 26 Aug 2015 


Ingo Scholtes, Nicolas Wider, Antonios Garas: 

Higher-Order Aggregate Networks in the Analysis of Temporal Networks 


Higher-Order Aggregate Networks in the Analysis of 
Temporal Networks: Path structures and centralities 


Ingo Scholtes, Nicolas Wider, Antonios Garas 

Chair of Systems Design, ETH Zurich, Switzerland 

www.sg.ethz.ch 


Abstract 

Recent research on temporal networks has highlighted the limitations of a static network 
perspective for our understanding of complex systems with dynamic topologies. In particular, 
recent works have shown that i) the specific order in which links occur in real-world tempo¬ 
ral networks affects causality structures and thus the evolution of dynamical processes, and 
ii) higher-order aggregate representations of temporal networks can be used to analytically 
study the effect of these order correlations on dynamical processes. In this article we ana¬ 
lyze the effect of order correlations on path-based centrality measures in real-world temporal 
networks. Analyzing temporal equivalents of betweenness, closeness and reach centrality in 
six empirical temporal networks, we first show that an analysis of the commonly used static, 
time-aggregated representation can give misleading results about the actual importance of 
nodes. We further study higher-order time-aggregated networks, a recently proposed general¬ 
ization of the commonly applied static, time-aggregated representation of temporal networks. 
Here, we particularly define path-based centrality measures based on second-order aggregate 
networks, empirically validating that node centralities calculated in this way better capture 
the true temporal centralities of nodes than node centralities calculated based on the com¬ 
monly used static (first-order) representation. Apart from providing a simple and practical 
method for the approximation of path-based centralities in temporal networks, our results 
highlight interesting perspectives for the use of higher-order aggregate networks in the anal¬ 
ysis of time-stamped network data. 


1 Introduction 

The network perspective has provided valuable insights into the structure and dynamics of 
numerous complex systems in nature, society and technology. However, most of the com¬ 
plex systems studied from this perspective are not static, but rather exhibit time-varying 
interaction topologies in which elements are only linked to each other at specific times or 
during particular time intervals. While the topological characteristics resulting from which 
elements are linked to which other elements have been studied extensively, the importance 
of the additional temporal dimension resulting from when these links occur has become clear 
only recently. And despite an increasing volume of research, its full impact on the properties 
of complex systems and on the evolution of dynamical processes still eludes our understand¬ 
ing 0. 
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Addressing this open issue, different strands of research have focused on the question 
how different types of temporal characteristics of complex networked systems - such as 
the activation times of nodes, the inter-event times between links, the duration and/or 
concurrency of interactions, or the order in which these interactions occur - affect the 
properties of temporal networks as well as dynamical processes evolving on them. For 
a couple of systems, it was shown that inter-event times follow heavy-tailed distribu¬ 
tions which in turn significantly influence the speed of processes like spreading and diffu¬ 
sion (sum eh [mi na equzi [27i [ 22 U 121 no] . 

Apart from the timing of interactions, the order in which these interactions occur is 
another important characteristic of temporal networks. Not only does the ordering of inter¬ 
actions crucially affect causality in temporal networks, it has also been shown to dramatically 
shift the evolution of dynamical processes compared to what we would expect based on a 
static, time-aggregated perspective d 121 d m d- Some of these works have further 
taken a modeling perspective, highlighting that real-world temporal network data exhibit 
non-Markovian characteristics in the sequence of links which are not in line with the Marko- 
vianity assumption that is (implicitly) made when studying static representations of time- 
varying complex networks. Neglecting these non-Markovian characteristics not only leads to 
wrong results about dynamical processes, it also leads to wrong centrality-based rankings of 
nodes, as well as misleading results about community structures |25] [24] . 

The main reason why an analysis of static, time-aggregated networks yields misleading 
results about the properties of temporal networks is that the ordering of links can alter path 
structures in temporal networks compared to what we would expect based on their static 
topology. Precisely, in static network the presence of two links (a, b ) and (6, c) connecting 
nodes a to b and b to c respectively necessarily imply that a path from a via b to c exists. 
However in a temporal network, for a to be able to influence c the link (a, b) must occur before 
the link (6, c) and thus the presence of a path depends on the ordering of links. This simple 
example highlights that the mere ordering of links in temporal networks can introduce an 
additional temporal-topological dimension that can neither be understood from the analysis 
of static, time-aggregated representations, nor from the analysis of inter-event times or node 
activity distributions (2T| . 

Highlighting the important consequences introduced by the specific ordering of links 
in real-world temporal networks, in this article we study how this ordering affects path- 
based centrality measures in temporal networks. The main contributions of our work can be 
summarized as follows: 

1. Building on the concept of time-respecting paths with a maximum time difference 
between consecutive links as previously discussed in dig, we introduce three differ¬ 
ent notions of path-based temporal node centralities which emphasize the additional 
temporal-topological dimension that is introduced due to the ordering of links in tem¬ 
poral networks. In particular, we formally define temporal variations of betweenness, 
closeness and reach centrality and demonstrate how they can be computed based on 
the topology of shortest time-respecting paths emerging in temporal networks. 

2. Calculating these temporal centrality measures for six empirical data sets, we quan¬ 
tify to what extent a ranking of nodes based on temporal centralities coincides with a 
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ranking of nodes based on the same measures, however calculated based on the corre¬ 
sponding static, time-aggregated networks. From our results we conclude that, possibly 
due to non-Markovian characteristics previously highlighted in ED CHS, a static anal¬ 
ysis of node centralities yields misleading results about the importance of nodes with 
respect to time-respecting paths. 

3. Generalizing the usual time-aggregated static perspective on temporal networks, we 
further develop the second-order time-aggregated representations introduced in |25| . 
obtaining higher-order time-aggregated representations which can be conveniently an¬ 
alyzed using standard network-analytic methods. Notably, despite being static repre¬ 
sentations of temporal networks, we show that these higher-order representations allow 
to incorporate those order correlations that have been shown to influence the causal 
topologies of temporal networks. 

4. We finally define generalizations of static betweenness, closeness and reach centrality 
based on a second-order aggregate representation of temporal networks. Using six data 
sets on temporal networks, we show that these second-order generalizations of centrali¬ 
ties constitute highly accurate approximations for the true temporal centrality of nodes 
calculated based on the detailed time-respecting path structures in temporal networks. 

The remainder of this article is structured as follows: In section [2] we first introduce ba¬ 
sic concepts such as our notion of temporal networks, time-aggregated and time-unfolded 
representations of temporal networks, as well as time-respecting paths with maximum time 
differences between consecutive links. In section [3] we introduce the framework of higher- 
order time-aggregated networks, a simple abstraction of temporal networks that takes into 
account the statistics of time-respecting paths up to a given length. In section [4] we finally 
define three temporal centrality measure which account for the temporal-topological charac¬ 
teristics introduced by the shortest time-respecting path structures in real-world temporal 
networks. Comparing the importance of nodes according to i) temporal centralities, ii) cen¬ 
tralities calculated based on a commonly used static, time-aggregated representation, and 
iii) second-order centralities calculated based on a static, second-order time-aggregated rep¬ 
resentation, we show that higher-order aggregate networks provide interesting perspectives 
for the analysis of temporal networks. We finally conclude our article by a summary of key 
contributions and a discussion of open issues and future work. 
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(a) Temporal network Gi 




(c) Weighted, time-aggregated 
representation of both Gi and G2 


Figure 1: Time-unfolded and weighted static, time-aggregated representation of two temporal 
networks G*i and G 2 


2 Temporal Networks and Time-respecting Paths 

In this section, we formally introduce the basic concepts and definitions used throughout 
our work. In particular, we define the notion of a temporal network used throughout this 
article, as well as time-respecting paths which are the basis for the notions of distances and 
path-based centralities in temporal networks which will be used in subsequent sections. 

2.1 Temporal, time-aggregated and time-unfolded networks 

We define a temporal network G' T = (V, E T ) as a tuple consisting of a set of nodes V and a 
set E t C V x V x [0, T] of time-stamped links (v, w; t) £ E T for an observation period [0, T\. 
Importantly, we assume discrete time stamps t £ [0, T] and time-stamped links (v, w, t) which 
indicate the presence of the link (v, w) at time t. This “instantaneous” definition particularly 
does not allow links to be assigned a duration, i.e. we cannot directly assign links a time 
interval during which they exist. However, we can nevertheless represent links that persist 
for some time interval [tstart ,t en d\ by assuming some small unit of discrete time At and 
adding multiple time-stamped links ( v,w;t) at time stamps t = t s tart,t s tart + At, tstart + 
2Af,..., t en d■ These assumptions naturally lend themselves to real-world time-stamped data 
sets, which are typically obtained based on some sort of sampling, whose sampling frequency 
defines the smallest unit of time At. 


4/27 
















Irigo Scholtes, Nicolas Wider, Antonios Garas: 

Higher-Order Aggregate Networks in the Analysis of Temporal Networks 


For illustrative purposes it is often useful to be able to visualize temporal networks. 
Throughout this article, we will use so-called time-unfolded, networks, a simple and intuitive 
static representation of temporal networks which, in different variants, has been used in 
a number of previous works [H US H3 EH- The key idea of this two-dimensional static 
representation is to arrange all nodes on a horizontal dimension, while unfolding time to an 
additional vertical dimension as illustrated in Fig. [l] For an observation period [0,... ,T] and 
a given At we can then add temporal copies of all nodes for all possible time steps kAt (for 
k = 0,1,...). For simplicity, in the following we assume At = 1, which allows us to denote 
the temporal copies of a node u as ti f , Vt+i, Vt+ 2 , ■ ■ ■■ The main benefit of this construction is 
that it allows us to represent a time-stamped link (u, w; t ) by means of a static link ( i>t , Wt+ 1) 
connecting the temporal copies v t and w t +1 of node v and node w respectively. The intuition 
behind this notation is that a quantity residing at node v at time t can move to node w via a 
time-stamped link (v,w;t), arriving there at the next time step t+ 1. Two simple examples 
for time-unfolded static representations of two different temporal networks with five nodes 
and eight time-stamped links are shown in Fig. 1(a) and |l(b)| 

Despite the recent development of methods to study temporal networks, the most wide¬ 
spread way to study time-stamped network data is to aggregate all time-stamped links into 
a static, time-aggregated network G = (V. E). This means that, given a temporal network 
G t = (V,E t ), two nodes v,w £ V are connected in the static network whenever a time- 
stamped link exists at any time stamp, i.e., (v,w) £ E iff ( v,w;t ) £ E T for any t £ [0,T]. 
Additional information about the statistics of time-stamped links in the underlying tempo¬ 
ral network can be preserved by considering a weighted time-aggregated network, in which 
weights u>{v,w) indicate the number of times time-stamped links ( v,w\t) have been active 
during the observation period. I.e., we consider a weighted time-aggregated network with a 
weight function uj : E —> N defined as 


u}(v,w) := \{t £ [0, T] \{v,w;t ) e E T }\. 

Figure [l(cj| shows the weighted, time-aggregated networks corresponding to the two temporal 
networks shown in Fig. |!( a )l and 1 1(b) | These simple examples highlight the important fact 
that different temporal networks are consistent with the same weighted, time-aggregated 
network. This is due to the fact that in the time-aggregated network we lose all information 
on both the timing and the ordering of links in the temporal network. 


2.2 Time-respecting paths 

Importantly, both the timing and the ordering of links influence path structures in temporal 
networks. In particular, in the context of temporal networks we must consider time-respecting 
paths, an extension of the concept of paths in static network topologies which additionally 
respects the timing and ordering of time-stamped links mm®- For the remainder of this 
paper, we define a time-respecting path between a source node v and a target node w to be 
any sequence of time-stamped links 

(u 0 ,ui;fi), (vi,v 2 \t 2 ) (yi-i,vi]ti) 
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such that vq = v, i'i = w and the sequence of time-stamps is increasing, i.e. t\ < t 2 ■ ■ ■ < U. 
The latter condition on the ordering of links is particularly important since it is a necessary 
condition for causality. This means that for any temporal network a node a is able to influence 
node c based on two time-stamped links (a, b) and (6, c) only if link (a, b ) has occurred before 
link (6, c). A simple example for a time-respecting path (a, c;l), (c,d; 4) can be seen in 
Fig. 1(a) | where the time-unfolded representation of the temporal network G± is illustrated. 

At this point, it is important to note that, different from the usual notion of paths in static 
networks, the question whether a time-respecting path exists between two nodes requires to 


specify a start time to < t\. In the example of Fig. 1(a) we observe a time-respecting path 


(a,c;ti = 1), (c, d;t 2 = 5) between node a and d, which can only be taken if we consider 
paths starting at node a at time t 0 = 1. If instead we were to ask for a time-respecting 
path between a and d starting at node a at time to = 5, our only choice would be the path 
(a, c; 10), (c, d; 11). 


2.3 Time-respecting paths with a maximum time difference 

In the definition of a time-respecting path above, we have required that the sequence of 
time stamps of the links constituting the path must be increasing. Clearly, this condition is 
rather weak since it makes no assumptions whatsoever about the time difference between two 
consecutive time-stamped links on a time-respecting path. As such, for the mere existence of 
a time-respecting path in a temporal network evolving over a period of years, it is actually 
not important whether the time difference between two consecutive links is a few seconds or 
a few years. 

However, we typically study time-respecting path structures because they constitute the 
substrate for the evolution of dynamical processes which have intrinsic time scales that are 
much smaller than the period during which we observe a temporal network. In the study of 
time-respecting paths, it is thus often reasonable to impose a maximum time difference 5, i.e. 
we limit the temporal gaps between two consecutive time-stamped links that are considered 
to contribute to a time-respecting path to a maximum of j nn E]. In this case, rather 
than requiring a mere increasing sequence of time-stamps, we demand that the condition 
0 < ti + \ — ti < 5 must be fulfilled for all i = 1,..., l — 1. For a maximum time difference 
of 6 = 1, we thus limit ourselves to the study of time-respecting paths for which all time- 
stamped links occur at immediately consecutive time stamps. As another limiting case, we 
can consider S = oo, which means that we impose no further condition apart from the 
requirement the the sequence of time stamps of links on a time-respecting path is increasing 
Revisiting the example of Fig. l(a)| we observe that the time-respecting path (a, c; 1), (c, d; 5) 
only exists if we allow for a maximum time difference <5 = 4, while for all S < 4 the only 
time-respecting path between the nodes a and d is (a, c; 10), (c, d; 11). 


2.4 Shortest and fastest time-respecting paths 

Let us now formally define the length of time-respecting paths in a temporal network, which 
will allow us to define the notion of shortest time-respecting paths used throughout our work. 
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Due to the additional temporal dimension, the length of a time-respecting path 

can be studied both from a topological and a temporal perspective. Following the usual termi¬ 
nology, we call the number l of time-stamped links on a time-respecting path the (topological) 
length of the path. We further call the time difference ti — 1 1 + 1 the duration of the path. 
Here the increment by one accounts for the duration of the final link (v;_i, Vf,t{), i.e. for the 
fact that any process starting at node vq at time t\ will only reach node vi at time ti + \. 

Having defined both the length and duration of time-respecting paths, it is now trivial to 
define the shortest time-respecting path between two nodes v and w as the time-respecting 
path with the smallest (topological) length. In analogy, we define the fastest time-respecting 
path as the time-respecting path with the smallest (temporal) duration. Following our pre¬ 
vious comment about the necessity to define a start time to for a time-respecting path, it is 
clear that the shortest or fastest time-respecting path can only be found unambiguously with 
respect to a given start time f 0 , i.e. at different times during the evolution of a temporal net¬ 
work the same pair of nodes can be connected by different shortest or fastest time-respecting 
paths. 


2.5 Transitivity of paths in static and temporal networks 


Let us conclude this preliminary section by highlighting important differences between paths 
in static networks compared to time-respecting paths in temporal networks, that result 
from the ordering and timing of links. Let us first highlight that paths in static networks 
are transitive. This means that from the presence of two paths (vq, v\ ),..., (ffc-i, v k ) and 
(vk,Vk+ 1 ),..., (vi-i,vi) between vq and Vk and between v k and Vi respectively, we can con¬ 
clude that a path (vo, Vi), ..., (vi-i, v vi ) between nodes vq and vi necessarily exist^J This 
transitivity has the important mathematical consequence that the entries in the fc-th power 
A k of the adjacency matrix A of a static network topology count all possible paths of length 
k between all possible pairs of nodes. Furthermore, transitivity of paths is the basis for a 
wealth of algebraic network-analytic methods such as spectral partitioning, the analysis of 
dynamical processes based on eigenvectors and eigenvalues, or the computation of centrality 
measures that are based on eigenvalue problems. 

Notably, the property of transitivity of paths in static networks does not extend to time- 
respecting paths in temporal networks. Here, two time-respecting paths 
(vo,Vi\ti),...,(v k -i,v k ]t k ) and 

(vk,Vk+i',tk+i), • • •, {vi-i, Vi\U) only translate into a time-respecting path between vq and vi 
if t k < tk- i-i and, assuming that we impose a maximum time difference <5, if 0 < t k + i — t k <6. 

The simple observation that transitivity of paths holds in static networks, while it does not 
necessarily hold in temporal networks implies that by an analysis of static, time-aggregated 
networks, we may overestimate transitivity in temporal networks. We can again illustrate this 
using our simple example of Fig. |T] which shows two temporal networks G\ and G 2 that are 
both consistent with the same (weighted) time-aggregated network shown in Fig. 1(c) Here, 


1 


Note though that this transitive path may or may not be the shortest path between the two nodes. 
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judging from the presence of a path (a, c),(c, d) in the time-aggregated network, we may 
think that a time-respecting path connecting node a to d exists in the underlying temporal 
network. Looking at the two temporal networks Gi and G 2 shown in Fig. 1(a) and Fig. |l(b)~| 
respectively, we see that at least for small values for the maximum time difference 6 (such as 
6 = 1) a corresponding time-respecting path only exists in the temporal network G\, while 
it is absent in G 2 . 


3 Higher-Order Aggregate Networks 

In the previous section we have seen that for large maximum time differences S we expect 
the shortest time-respecting paths to be rather similar to the shortest path in a static, time- 
aggregated representation. This is an intuitive result since by using large maximum time 
differences 6, we apply an implicit “aggregation” of time stamps which may nevertheless be 
far apart in the temporal dimension. At the same time, we observe that for small values of 
S the temporal characteristics of the network result in time-respecting path structures that 
are markedly different from those in the static, time-aggregated network. As argued above, 
this implies that dynamical processes which evolve at time scales similar to that of the 
temporal network will be significantly affected by these path structures. It further questions 
the usefulness of path-based centrality measures that are computed based on the commonly 
used time-aggregated representation of temporal networks. 

In this section, we introduce higher-order time-aggregated networks, a simple yet powerful 
abstraction of temporal networks which can be used to address some of the aforementioned 
problems. It can be seen as a simple generalization of the usual first-order time-aggregated 
representation introduced in Section [2j and it has recently been shown to provide interesting 
insights about the evolution of dynamical processes in temporal networks [25| . 

3.1 k- th order aggregate networks 

The key idea behind this abstraction is that the commonly used time-aggregated network is 
the simplest possible time-aggregated representation whose weighted links captures the fre¬ 
quencies of time-stamped links. Considering that each time-stamped link is a time-respecting 
path of length one, it is easy to generalize this abstraction to higher-order time-aggregate 
networks in which weighted links capture the frequencies of longer time-respecting paths. For 
a temporal network G T = (V,E T ) we thus formally define a fc-th order time-aggregated (or 
simply aggregate) network as a tuple G^ = (V^ k \ E^) where V^ C V k is a set of node 
fc-tuples and E^ C V^ x V ^ is a set of links. For simplicity, we call each of the fc-tuples 
v = v\ — V 2 — ... — Vk (v £ V^ k \vi £ V) a k-th order node, while each link e £ E^ is called 
a k-th order link. We further assume that a fc-th order link ( v,w ) between two k-th order 
nodes v = v\ — V 2 —.. . — Vk and w = wi — W 2 — • ■. — Wk exists if they overlap in exactly k — 1 
elements such that Vi+ 1 = Wi for i = 1,... ,k — 1. The basic idea behind this construction is 
that each fc-th order link (y, w ) represents a possible time-respecting path of length k in the 
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underlying temporal network, which connects node vi to node Wk via k time-stamped links 

{vi,v 2 = wi\h), •••, (vk = Wk-\,w k ]tk) ( 1 ) 

In analogy to the weights in a usual (first-order) aggregate representation, we further define 
the weights of such k- th order links by the frequency of the underlying time-respecting paths 
in the temporal network. Considering a maximum time difference 5 and two fc-th order nodes 
v = V\ — v 2 — ■ ■ ■ — Vk and w = w\ — w 2 — ~ Wk we thus define 

cj(v,w) := \P(v,w,8)\ 

where 

P ={(v!,V2 = ■ ■ ■ 

...,(v k = w k -i,w k ;t k ) : 0 < fj+i -ti<6} 

is the set of all time-respecting paths in the temporal network that i) consist of the sequence 
of links indicated in Eq. [lj and ii) are consistent with a given maximum time difference of S. 

The higher-order aggregate network construction introduced above has a number of ad¬ 
vantages. First and foremost, it provides a simple static abstraction of a temporal network 
which can be studied by means of standard methods from (static) network analysis. Each 
static path of length l in a k- th order aggregate network can be mapped to a time-respecting 
path of length k + l — 1 in the original network. Importantly, and different from a first-order 
representation, fc-th order aggregate networks allow to capture non-Markovian characteris¬ 
tics of temporal networks. In particular, they allow to represent temporal networks in which 
the fc-th time-stamped link (y k = u>k-i,Wk) on a time-respecting path depends on the k— 1 
previous time-stamped links on this path. With this, we obtain a simple static network topol¬ 
ogy that contains information both on the presence of time-stamped links in the underlying 
temporal network, as well as on the ordering in which sequences of k of these time-stamped 
links occur. 


3.2 Example: second-order aggregate networks 

In the following, we illustrate our approach by constructing second-order aggregate repre¬ 
sentations of the two temporal networks G i and G 2 shown in Fig. [T] Both Gi and G 2 are 
consistent with the same first-order time-aggregated network. We can easily generate second- 
order 

time-aggregated networks of the two temporal networks by extracting all time-respecting 
paths of length two (and assuming a given maximum time difference S). For simplicity, in 
the following we limit our study to S = 1. For the temporal network G\ shown in Fig. l(a)| 
we observe the following four different time-respecting paths of length two: 


(a,c;l),(c,e;2) 
(6, c; 3), (c, d; 4) 
(6,c; 7), (c, e; 8) 
(a, c; 10), (c, d; 11) 
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Based on the definition of links and link weights outlined above, we thus obtain the following 
four weighted second-order links: 


u(a — c, c — e) = 1 
ui(b — c, c — d) = 1 
u>(b — c, c — e) = 1 
ui(a — c, c — d) — 1 


The resulting second-order network is depicted in Fig. 2(a) Applying the same methodology 
to the temporal network G 2 shown in Fig. |l(b)| we obtain the following four time-respecting 
paths of length two 


(a, c; 1), (c, e; 2) 
(b,c; 3), (c, d; 4) 
(P, c; 7), (c, d; 8) 
(a, c; 10), (c, e; 11) 


from which we obtain the following two weighted second-order links: 

ui(a — c, c — e) =2 
w(& — c, c — d) =2 

The resulting second-order aggregate network is shown in Fig.|2(b)] Here we observe that even 



(a) Temporal network G\ 


(b) Temporal network G 2 


Figure 2: Second-order aggregate networks G^ corresponding to the two temporal networks 
shown in Fig. [T] 


though the two temporal networks G\ and G 2 only differ in the order of two time-stamped 
links, the resulting second-order aggregate network is markedly different. The second-order 
network of G\ indicates time-respecting paths connecting node a to both nodes e and d (both 
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paths passing via node c). In particular, this corresponds to the connectivity that we would 
expect based on the transitivity of static paths in the first-order aggregate network shown 
in Fig. |l(c)| The second-order network shown in Fig. |2(b)| reveals that the transitive path 
(a, c), (c, d ) in the first-order aggregate network does not translate to a time-respecting path 
in the temporal network G 2 . 

Clearly, the second-order aggregate networks illustrated above are only a special, partic¬ 
ularly simple type of general, higher-order aggregate networks. Nevertheless, in the following 
section we will demonstrate that it contains important information about the causal topology 
of temporal networks which can help us in the analysis of temporal networks. 

In what follows, we will thus provide an in-depth study of second-order aggregate repre¬ 
sentations of six empirical data sets that will be introduced in the following section. Here, we 
will particularly focus on the question how second-order aggregate networks can foster the 
calculation of approximate measures for path-based node centralities in temporal networks. 


4 Temporal Node Centralities in Second-Order Aggre¬ 
gate Networks 

Having introduced the abstraction of higher-order aggregate networks in section [3] let us now 
demonstrate the use of a second-order aggregate representation for the study of path-based 
centralities in temporal networks. We will study this question using the following six, pub¬ 
licly available empirical data sets representing different types of temporal networks: (AN) 
covers time-stamped antenna-antenna interactions inferred from a filming of ants in an ant 
colony [2] ; (EM) represents time-stamped E-Mail exchanges between employees in a manufac¬ 
turing company |17| : (HO) covers time-stamped proximity interactions between patients and 
medical staff in a hospital [30j; (RM) is based on time-stamped social interactions between 
students and academic staff at a university campus j4|; (LT) has been reconstructed from 
data on passenger itineraries in the London Tube metro system available through the Rolling 
Origin and Destination Survey of the Transport of London [5], and (FL) was constructed 
based from data on flight itineraries of passengers on domestic flights in the United States 
available from the Bureau of Transportation Statistics |T] . A detailed description about the 
processing of these data sets and the extraction of time-stamped network data is available 
in [ 25] . which is why we omit an elaborate discussion here. 

Regarding the choice of a reasonable maximum time difference <5 for the notion of shortest 
time-respecting paths as discussed in section[2] we emphasize that the choice of this parameter 
needs to be adapted to the inherent time scale of the network evolution in each of the 
six data sets individually. In general, such a choice is non-trivial as it heavily influences 
i) whether or not pairs of nodes can reach each other, and ii) to what extent temporal 
characteristics influence the structures of time-respecting paths. In particular, for too small 
choices of 5 the definition of time-respecting paths is likely to be too restrictive and almost 
no paths will be found mm]. Contrariwise, the choice of a too large value for 5 results in 
the fact that we effectively “aggregate” the time-stamped sequence of links, thus discarding 
information about the detailed ordering and timing of links. For our analysis, for each of the 
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six data sets individually, we have thus chosen the minimum parameter <5 for which we still 
obtain a topology of time-respecting paths that is strongly connected, thus ensuring that we 
can compute reasonable measures of path-based centralities while retaining as much of the 
temporal characteristics as possible (c.f. details in [25]). 

In the remainder of this section, we will focus our analysis on three widely adopted path- 
based notions of centrality, namely i) betweenness, ii) closeness and iii) reach centrality. The 
rationale behind this choice is that all of these three measures can easily be computed based 
on paths in time-aggregated networks, while they additionally facilitate a straight-forward 
extension to temporal networks based on the notion of shortest time-respecting paths (c.f. 
similar extensions studied in 0 [HI EB]). In the following, we first formally define the 
temporal betweenness, closeness and reach centrality of nodes. We then compute the resulting 
measures for all nodes based on the actual shortest time-respecting paths in the time-stamped 
link sequences in our six data sets (and using the individually determined maximum time 
difference S). The resulting centrality scores are considered as the ground-truth against which 
we then compare the centrality scores resulting from the application of the same centrality 
measures to i) the commonly used (first-order) time-aggregated representation, and ii) a 
second-order aggregate network representation of the corresponding temporal network. 

4.1 Temporal Betweenness Centrality 

We first address the question to what extent the temporal betweenness centrality of nodes 
in a temporal network can be approximated by means of static betweenness centralities 
calculated based on static, time-aggregated representations. To this end, we first formally 
define the temporal betweenness centrality of a node in a temporal network. According to 
the common definition, the (unnormalized) betweenness centrality of a node v is simply 
calculated as the total number of shortest paths passing through node v |B]. Highlighting 
the fact that we can directly apply this measure to first-order time-aggregated networks, we 
thus define the first-order betweenness centrality BC^(u) of a node v as 

BC«(u):= ^ |P«(u,m;u)| (2) 

U^V^W 

where P^\u,w;v) denotes the set of those shortest paths from node u to w in a static 
network that pass through node v. 

Applying this idea to temporal networks, a straight-forward way to define the temporal 
betweenness centrality of a node is to count all shortest time-respecting paths passing through 
it. However, and as mentioned in Section [2j temporal networks introduce the complication 
that, in order to unambiguously define shortest time-respecting paths, we need to include a 
start time to starting from which time-respecting paths are to be considered. For each pair 
of nodes u, w and each start time to we can thus directly define an instantaneous distance 
function for a temporal network as 

dist temp (zi, v, t 0 ) := len(p),p e P temp (u, v, t 0 ) (3) 

where P temp (u, v, to) denotes the set of shortest 

time-respecting paths from u to v that start at time t 0 (and which are consistent with a 
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given maximum time difference 5). Based on this instantaneous definition of shortest time- 
respecting paths, we can further define a distance function that gives the minimum distance 
across any start time as follows: 

dist temp (u, v) := mindist temp (u, v, t 0 ) (4) 

to 

With this we can further define the set of shortest time-respecting paths across all start times 
as 


P temp (u,v) := |J{p G P temp (u,V,t 0 )\len(p) = dist temp (u, v)} (5) 

*0 

i.e. we only consider those (instantaneous) shortest time-respecting paths whose lengths 
correspond to the minimum shortest time-respecting length across all possible start times. 
We can now define the temporal betweenness centrality BC temp (u) of a node v in analogy to 
Eq. [2] as 

BC temp (u) := E |P temp (u, w, u)| (6) 

U^V^W 


where P temp (u, w; v) denotes the set of those shortest time-respecting paths across all start 
times which connect node u to w and which pass through node v. 

Let us illustrate this definition using the temporal networks shown in Fig. 1(a) and 
Fig. |l(b)[ Applying the static be tween ness centrality as defined in Eq. [2] to the first-order 
aggregate network shown in Fig. 1(c) we find that for node c we have BC^(c) = 4, while 
for all other nodes we have a betweenness centrality of zero. Again assuming S = 1, for the 
temporal betweenness centrality of node c in network Gi shown in Fig. l(a)| we find that 
indeed four shortest time-respecting paths pass through node c, i.e. we have BC tcmp (c) = 4 
while we again have a zero temporal betweenness centrality for all other nodes. Notably, 
in this particular case the temporal betweenness centralities of nodes correspond to the 
betweenness centralities of nodes calculated based on the first-order time-aggregated network. 
This happens because all paths in the first-order aggregate network have a counterpart in 
terms of a shortest time-respecting path. 

However, in section [2] we have seen that, in general, shortest time-respecting paths in 
temporal networks may not coincide with shortest paths in the (first-order) time-aggregated 
network. As a consequence, the temporal betweenness centralities of nodes may differ from 
the first-order betweenness centralities calculated from a static, first-order aggregate rep¬ 
resentation. This can be seen for the temporal network G 2 shown in Fig. |l(b)| Based on 
the temporal sequence of time-stamped links, here we find only two different shortest time- 
respecting paths passing through node c, namely one connecting node a via c to e and a 
second one connecting node b via c to d. The two additional shortest time-respecting paths 
found in Gi are absent in G 2 , therefore in G 2 node c has a temporal betweenness centrality 
BC tem P( c ) = 2, thus being, at least from the perspective of temporal betweenness centrality, 
less important than in Gi. 

In the following we study the question to what extent first-order betweenness centralities 
can be used as a proxy for the temporal betweenness centralities of nodes in our six data sets 
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BC temp 

Pearson 

- BC (1) 
Kendall-Tau 

gQtemp 

Pearson 

~ bc (2) 

Kendall-Tau 

E-Mail (EM) 

Ants (AN) 

Hospital (HO) 
RealityMining (RM) 
London Tube (LT) 
Flights (FL) 

0.80 (3.29e-22) 
0.82 (3.49e-16) 
0.93 (2.39e-23) 
0.95 (2.83e-30) 
0.85 (2.58e-37) 
0.99 (6.91e-108) 

0.73 (8.36e-26) 
0.64 (2.05e-13) 
0.81 (1.18e-17) 
0.62 (7.28e-12) 
0.66 (1.22e-29) 
0.66 (9.09e-26) 

0.97 (7.52e-60) 
0.80 (1.96e-14) 
0.96 (2.36e-30) 
0.93 (3.74e-26) 
0.87 (3.28e-42) 
0.99 (2.66e-98) 

0.74 (l.lle-26) 
0.59 (1.94e-ll) 
0.87 (5.55e-20) 
0.75 (1.12e-16) 
0.71 (9.32e-34) 
0.65 (4.25e-25) 


Table 1: Pearson and Kendall-Tau rank correlation coefficients between temporal betweenness 
centrality (ground truth) and betweenness centrality calculated based on the first-order ag¬ 
gregate network and the second-order aggregate network. Values in parentheses indicate the 
p-value. 


of real-world temporal networks. In particular, we study this question in the following way: 
For each node v in the six data sets we calculate i) the first-order betweenness centrality 
BC^(u) based on the first-order aggregate network, as well as ii) the (ground truth) tem¬ 
poral betweenness centrality BC temp (u) based on actual shortest time-respecting paths in 
the temporal network. We then assess the correlation between both measures by computing 
the Pearson correlation coefficient (as well as the corresponding p-value) for the sequence of 
paired values (BC^(z),BC temp (?')) for all nodes i £ V. 

Since centrality scores of nodes in networks are often used and interpreted in a relative 
fashion, we further perform an additional analysis that accounts for variations in the ac¬ 
tual centrality values, which however may not affect the relative importance of nodes. For 
this, we first rank nodes according to their temporal and first-order betweenness centrali¬ 
ties respectively. We then calculate the Kendall-Tau rank correlation coefficient in order to 
quantitatively assess to what extent nodes are ranked similarly according to both notions of 
centrality (even though the actual centrality values for these nodes may differ). 

The results of this analysis are shown in the left column of Table[l] in which we report both 
the Pearson as well as the Kendall-Tau rank correlation coefficients between the temporal 
and the first-order betweenness centralities of nodes for each of the six data sets introduced 
above. Here, a first interesting result is that both the Pearson and the Kendall-Tau rank 
correlation coefficients exhibit a large variation between 0.75 and 0.99, as well as 0.59 and 
0.81 respectively. The results indicate that, depending on the characteristics of the underlying 
temporal network, temporal betweenness centralities can be reasonably well approximated 
by first-order betweenness centrality for some data sets (e.g., for (FL), (HO), (RM)) while 
such an approximation should be taken with caution for other data sets. 

Based on these results it is reasonable to ask if we can better approximate temporal 
centrality, especially for those data sets where the correlation between the first-order and the 
temporal betweenness centrality is comparably weak. In Section [3] we have argued that the 
generalization of higher-order aggregate networks allows to construct static representations 
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Figure 3: Simple example for a second-order aggregate network 


of temporal networks that capture both temporal and topological characteristics that emerge 
from the ordering of links and the statistics of time-respecting paths. Focusing on a second- 
order representation, in the remainder of this section we will study to what extent second- 
order aggregate networks can be used in the analysis of temporal node centralities. 

Importantly, such an analysis is facilitated by the fact that second-order aggregate net¬ 
works are static networks , which allows for a straight-forward application of standard central¬ 
ity measures to the second-order topology. In the case of second-order aggregate networks, 
applying standard centrality measures we obtain centrality values for higher-order nodes 
(■ v,w ), each of the higher-order nodes being a fc-tuple of nodes in the first-order network. 
In order to arrive at a centrality measure for the original (first-order) nodes, we thus must 
project this measure to the level of nodes in the first-order network. 

Luckily, this can be done in a simple way which we outline in the following: For a 
second-order network G*- 2 -* = (V^ 2 \E^), let us first define a second-order distance func¬ 
tion dist^ (v, w) which, for each pair of first-order nodes v,w G V^\ gives the length of a 
shortest path based on the topology of the second-order aggregate network as 

dist^(u, w) := min L (2) (x,y) +1 (7) 

x,yeu (2) 
x=v—* 
y—^ — W 

where L^ (x, y) denotes the length of a shortest path between the second-order nodes x, y G 
V( 2 \ The rationale behind this definition is that in the second-order aggregate network, we 
can have multiple shortest paths with different lengths between different second-order nodes, 
which nevertheless map to paths between a single pair of first-order nodes. As an example, 
consider the two first-order nodes a and d in the simple second-order network shown in Fig. [3] 
Here we observe that, from the perspective of second-order nodes, both (a — b,b — d) as well 
as 

(a — b, b — c), (b — c, c — d) are shortest paths (between different pairs of nodes) in the second- 
order network with lengths (a — b 1 b — d) = 1 and 
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L (a — 6, c — d) = 2 respectively. However, from the perspective of first-order nodes both of 
these second-order paths connect node a to node d (via paths of length 2 and 3 respectively). 
Using the definition from Eq. [7] thus allows us to correctly calculate the second-order distance 
between a and d as dist*- 2 -* (a, d) = L ^ (a — 6, b — d) + 1 = 2. 

The above definition of a second-order distance function now allows us to define a second- 
order betweenness centrality BC^ (v) of a node v based on Eq. [ 2 } For this, we simply count 
all second-order shortest paths between two nodes u and w which i) pass through node v, and 
ii) whose length corresponds to the second-order distance dist^(it, v). Formally, we define 

BC (2) (v) := 

\{p G P (2 ^(u — x, y — w\ v) : len(p) = dist^(u, iu)}| ^ 

x^yeV 
U—XtzV^ 
y—wEV^ 


where, in analogy to pW(it, w: v) above, P^ 2 \u — x,y — w; v) denotes the set of all shortest 
paths in the second-order network that connect node u — x to y — w and that pass through 
a first-order node v. 

With this, we have defined a second-order betweenness centrality which allows to calculate 
node centralities in a way that incorporates the causal topology as captured by the second- 
order aggregate network. Let us again illustrate this approach using the simple examples 
shown in Fig. [I] For the temporal network Gi we can compute a second-order betweenness 
centrality based on the second-order network shown in Fig. 2(a) Here we observe a total of 
four shortest paths between pairs of nodes in the second-order network, namely: 


(a — c, c — e) 
(a — c, c — d) 
(b — c, c — d) 
(b — c, c — e) 


For each node in the first-order network, we can now count the number of second-order 
shortest paths that they are on, obtaining B^ 2 \c) = 4 while B^ (x) = 0 for all nodes x ^ c. In 
this particular case, the second-order betweenness centrality values exactly correspond both 
to the temporal as well as the first-order betweenness centralities. Again, this is different 
for the temporal network G 2 shown in Fig. |l(b)| Considering the second-order aggregate 
network shown in Fig. |2(b)| we only find the following two shortest paths in the second- 
order aggregate network 


(b — c, c — d) 

(a — c, c — e) 

thus obtaining BC^ 2 -*(c) = 2. Here, we find that while the second-order betweenness cen¬ 
tralities in G 2 corresponds to the temporal betweenness centralities, they differ from those 
calculated from the first-order aggregate network. The reason for this is that in the example 
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G 2 shortest time-respecting paths of length two differ from what we would expect based on 
the first-order network. 

We emphasize that the exact correspondence between the second-order and the temporal 
betweenness centralities in the examples discussed above is because we have no shortest 
time-respecting paths of length three or longer, whose presence could differ from what we 
expect based on the second-order network. To what extent this affects the applicability 
of second-order aggregate networks in real-world scenarios is not clear and thus requires a 
further investigation. In the following, we thus study to what extent second-order betweenness 
centrality can be used to approximate the temporal betweenness centralities of nodes in the 
six real-world data sets studied above. For this, we first construct a second-order aggregate 
network as introduced in Section [3j We then calculate the betweenness centrality values 
BC < ' 2 )('i>) of all nodes v as described above, comparing the resulting centralities with the 
(ground-truth) temporal betweenness centralities 
BC temp (w). 

The results of this analysis are shown in the right column of Table [1] Here we find that 
for most of the data sets, second-order betweenness centralities are correlated with the true, 
temporal betweenness centralities in a stronger way than the corresponding first-order ap¬ 
proximation of betweenness centrality. For the (EM) data sets capturing E-Mail exchanges 
between employees in a manufacturing company, we observe an increase of the Pearson cor¬ 
relation coefficient p from 0.80 to 0.97, while the associated Kendall-Tau rank correlation 
coefficient r increases rather mildly from 0.73 to 0.74. We attribute this to the fact that the 
second-order aggregate network better captures the structures of time-respecting paths in 
the temporal network compared to the first-order network. For the two data sets (HO) and 
(LT) we observe a similar increase both in the Pearson and the Kendall-Tau rank correlation 
coefficients, while the values remain largely unchanged for the (FL) data set. In particular, 
for the latter data set the first-order betweenness centrality already exhibits a correlation 
coefficient of 0.99 which indicates that in this particular case temporal characteristics do 
not significantly alter the structure of shortest time-respecting paths. For the two data sets 
(AN) and (RM) we observe a small decrease in the Pearson correlation values for the second- 
order approximation. Notably, for (RN) the decrease from 0.97 to 0.95 is accompanied by an 
increase of the Kendall-Tau coefficient from 0.62 to 0.75. This indicates that, even though 
the actual values of second-order betweenness centralities may be less correlated with tem¬ 
poral betweenness centralities than the first-order betweenness centralities, the second-order 
betweenness centralities provides us with a significantly better perspective on the relative 
importance of nodes. 

Finally, for the (AN) data set we note that both the Pearson and the Kendall-Tau rank 
correlation coefficients are worse for the second-order betweenness centralities. While the 
interesting question in what respect the temporal characteristics of (AN) differ from those 
of the other temporal networks remains to be investigated in more detail, we expect this 
result to be related to non-stationary properties. We particularly observe that some of the 
nodes (i.e. ants) are only active during certain phases of the observation period. This imposes 
a natural ordering of interactions which particularly prevents nodes which are only active 
during an early phase to be reachable from nodes which are only active at a later phase. 
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4.2 Temporal Closeness Centrality 

Let us now turn our attention to closeness centrality , which captures a node’s average distance 
to all other nodes in a network. For a directed, static (first-order aggregate) network the 
closeness centrality of a node v is commonly defined as 


ClC (1) (u) = 

U^V 


1 

dist^(«, v) 


(9) 


where the distance function dist^(u, v) denotes the distance, i.e. the length of a shortest 
path, from node u to v in the first-order aggregate network. 

We can easily define a temporal version of closeness centrality based on the temporal 
distance function 

dist temp (u, v) which we have defined in Eq. [ 4 ] in the context of temporal betweenness cen¬ 
trality. Here, we remind the reader that the function dist temp (u, v) captures the minimum 
length of a shortest time-respecting path across all possible start times to- Using this tempo¬ 
ral distance function, we can apply the standard definition in Eq. [9] and define the temporal 
closeness centrality of a node v in a temporal network as 


ClC temp (u) 


y_I_ 

h dist^^w) 


( 10 ) 


Let us again illustrate this definition using the temporal networks shown in Fig. [T] Node 
e in the temporal network G\ shown in Fig. 1(a) can be reached from nodes a and b 
via two shortest time-respecting paths of length two, as well as from node c via a short¬ 
est time-respecting path of length one. For the temporal closeness centrality, we thus find 
ClC te mP(e) = 2. It is easy to confirm that this corresponds to the first-order closeness cen¬ 
trality of node e. Again a mere reordering of links can change the closeness centralities of 
nodes, as can be seen in the temporal network G 2 shown in Fig. |l(b)| Here, we see that node 
e can only be reached from node a via a shortest time-respecting path of length two, as well 
as from node c via a shortest time-respecting path of length one. For node e in the tempo¬ 
ral network G 2 we thus find a temporal closeness centrality ClC temp (e) = 1.5, highlighting 
that it is, at least from the perspective of closeness centrality, less “important” than in the 
temporal network G\. 

Considering the example above we see that, due to the ordering and timing of links, first- 
order closeness centralities can be a misleading proxy for the temporal closeness centralities 
of nodes in temporal networks. In the following we thus again empirically study this question 
using our six data sets on temporal networks. We again use the temporal closeness central¬ 
ities ClC temp (u) of nodes as the ground truth, then studying whether temporal closeness 
centralities can reasonably be approximated by first-order closeness centralities ClC^(u). 
The results of this analysis are shown in the left column of Table [2] which reports the ob¬ 
served Pearson and Kendall-Tau rank correlation coefficients for each of the six data sets. 


We observe again that the answer to the question of how well temporal closeness central¬ 
ities can be approximated by first-order static closeness centralities depends on the actual 
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C 1C temp 

~ C1C (1) 

IQ temp 

~ C1C (2) 


Pearson 

Kendall-Tau 

Pearson 

Kendall-Tau 

E-Mail (EM) 

0.93 (4.74e-44) 

0.79 (4.96e-30) 

0.98 (2.52e-71) 

0.92 (1.54e-40) 

Ants (AN) 

0.91 (1.67e-24) 

0.75 (1.54e-17) 

0.96 (2.05e-35) 

0.83 (4.80e-21) 

Hospital (HO) 

0.96 (2.09e-29) 

0.83 (1.88e-18) 

0.99 (1.46e-40) 

0.90 (1.76e-21) 

RealityMining (RM) 

0.96 (1.03e-33) 

0.77 (1.99e-17) 

0.99 (1.64e-51) 

0.89 (5.30e-17) 

London Tube (LT) 

0.98 (1.33e-91) 

0.87 (2.57e-49) 

0.98 (3.26e-92) 

0.87 (1.07e-49) 

Flights (FL) 

0.91 (3.35e-46) 

0.81 (1.88e-18) 

0.97 (4.57e-75) 

0.93 (9.57e-50) 


Table 2: Pearson and Kendall-Tau rank correlation coefficients between temporal closeness 
centrality (ground truth) and closeness centrality calculated based on the first-order aggregate 
network and the second order aggregate network. Values in parentheses indicate the p-value. 


data set. The lowest Pearson correlation coefficient of 0.91 is obtained for the (FL) and the 
(AN) data sets, while the highest Pearson correlation coefficient of 0.98 is obtained for (LT). 
The lowest Kendall-Tau rank correlation coefficient is 0.75 for (AN), while the highest value 
of 0.87 is achieved for (LT). We further observe that, compared to betweenness centralities, 
we generally obtain conceivably larger correlation values between temporal and first-order 
closeness centralities. This can intuitively be explained by the fact that, while temporal 
betweenness centralities are influenced by the actual structure of shortest time-respecting 
paths, temporal closeness centralities are merely influenced by their lengths. We thus expect 
temporal closeness centrality to be insensitive to characteristics of temporal networks that 
change the structure of paths but not their lengths, hence explaining the larger correlation 
coefficients. 

Let us now study whether we can better approximate temporal closeness centralities using 
a generalization which is calculated based on the static, second-order aggregate representation 
of a temporal network. For this we first introduce how closeness centralities of nodes can be 
calculated based on a second-order aggregate network. We recall that in Eq.[7]we have defined 
a second-order distance function disP 2 ^ (u, re) which provides us with the distance between 
(first-order) nodes based on shortest paths in a second-order aggregate network. This distance 
function allows us to directly define a second-order closeness centrality ClC^^u) as 


ClC (2) (u) = J2 

U^V 


1 

disP 2 ^(u, v) 


( 11 ) 


i.e. for each node v in a network, we simply sum the inverse of the distances to all nodes 
according to the topology of the second-order aggregate network. 

Again, we illustrate the notion of second-order closeness centrality using the two illus¬ 
trative examples of temporal networks shown in Figjl] Fig. 2(a) shows the second-order 
aggregate network corresponding to the temporal network G\ shown in Fig. |l(a) Here we 
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find that the second-order node c — e can be reached via two shortest paths 

(6- c), (c- e) 

(a-c),(c-e) 

of length one from the second-order nodes 5 —c and a — c. Furthermore, we have an additional 
second-order “path” of length zero from node c — e to itself. Using the second-order distance 
function as defined in Eq. [7] we thus infer the following values: 

dist^ (b, e) = 2 
dist^(a, e) = 2 
dist (2 ^(c, e) = 1 


from which we calculate the second-order closeness centrality of node e as ClC^(c) = 2. 

Again, in this particular example the second-order closeness centrality corresponds both 
to the temporal and the first-order closeness centrality. This is different in the second- 
order network shown in Fig. |2(b)[ which corresponds to the temporal network G 2 shown 
in Fig. |l(b)| Here, we find that the second-order node c — e can only be reached via a single 
shortest path (a — c), (c — e) as well as via an additional second-order “path” of length zero 
from e — c to itself. From this, we can calculate the following second-order distances 

dist (2 ^(a, e) = 2 
dist (2 ^(c, e) = 1 


and for the second-order closeness centrality of node e we thus obtain ClC^(c) = 1.5, which 
coincides with the temporal closeness of node e in the underlying temporal network G 2 . 

Using the the second-order closeness centrality introduced above, let us now study the 
correlations between the temporal and the second-order closeness centralities of nodes in 
our six data sets. The results of this analysis are shown in the right column of Table [2] 
For five of the six data sets we observe significantly larger correlation coefficients than those 
reported for the first-order closeness centrality in Table [2j The largest increase of the Pearson 
correlation coefficient from 0.91 to 0.97 is achieved for the (FL) data set, while we observe 
no improvement of the (already large) Pearson correlation coefficient of 0.98 for (LT). We 
further observe significant increases in the Kendall-Tau rank correlation coefficients for all 
of the studied data sets, except for (LT) for which it remains the same. For the ranking of 
nodes in (EM), we find that a ranking based on second-order closeness centralities increases 
the Kendall-Tau rank correlation with the ground truth temporal centralities from 0.79 to 
0.92, thus better representing the relative importance of nodes in the temporal network. 
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4.3 Temporal Reach Centrality 

Concluding this section we finally study reach centrality , another notion of path-based cen¬ 
trality that captures the number of nodes that can be reached from a node via paths up to 
given maximum length s [3]. For static networks, such as a first-order aggregate network, we 
define the first-order reach centrality of a node v as 

CoC^(u, s) := ^ 0(dist ^(v,w) — s) (12) 

w^V 

where 0(-) is the Heaviside function, dist^(u, u) is the length of a shortest path from node 
v to u in the static, first-order network, and s is a parameter specifying up to which length 
paths should be considered. Clearly, the reach centrality CoC^(u,s = 1) of a node v is 
equal to its out-degree while CoC*- 1 ^?;, s = oo) is equal to the subset of nodes to which v is 
connected via directed paths of any length. 

A temporal reach centrality can again easily be defined based on the notion of shortest 
time-respecting paths, as well as the temporal distance function dist temp (u, w) defined in 
Eq- [U Here, for a given maximum time difference 6 and a given value s, we are interested 
in how many different nodes can be reached via shortest time-respecting paths which have 
at most length s. In analogy to Eq. |12| we can thus define the temporal reach centrality 
CoC temp (u) of a node v as: 

CoC temp (i>, s) := 0(dist temp (u, w) — s ). (13) 

wev 


We want to highlight that with this definition of reach centrality, we focus on the temporal- 
topological characteristics introduced by the ordering of links, which is why base our defini¬ 
tion on the shortest rather than the fastest time-respecting paths. 

It is finally easy to see that a second-order reach centrality can be defined in analogy 
to second-order closeness centrality. For this, all we have to do is to replace the distance 
function in Eq. [12] by our previously defined second-order distance function, thus obtaining 
the following definition: 

CoC^(u,s) := ^ 0(dist ^ 2 \v,w) — s). (14) 

wGV 


Using a value of s = 2, we again exemplify these definitions using our two illustrative 
examples. Let us first calculate the first-order reach centrality of node a based on the first- 
order aggregate network shown in Fig. |l(c)| Here we find that there are paths of at most 
length s = 2 from node a to the three nodes c, d and e, from which we conclude CoC^(a, s = 
2) = 3. For the temporal reach centrality of node a in the temporal network G i shown in 
Fig. 1(a) | we observe that there are time-respecting paths of at most length s = 2 from node 
a to the three nodes c, e and d. We hence conclude CoC temp (a, s = 2) = 3, finding that for 
Gi the temporal reach centrality again corresponds to the first-order reach centrality. Again, 
this is not the case for the temporal network G 2 shown in Fig. |l(b)| Here, node a is only 
connected to the nodes c and e via time-respecting paths of up to length two, which means 
that we have CoC temp (a, s = 2) = 2. 
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For the second-order reach centrality of node a in the temporal network G\ let us now 
consider the second-order aggregate network shown in Fig. 2(a)| Based on the shortest paths 
in the second-order network, we first find that the node a — c is connected to two nodes 
c — d and c — e via shortest paths of length one. Furthermore, we find an additional shortest 
path of length zero which connects the second-order node a — c to itself. Again, using our 
second-order distance function dist^ here we find the distances 


dist^(a, c) = 1 
dist^ 2 ' (a, e) = 2 
dist^(a, d) = 2 


from which we conclude that three nodes c, e and d can be reached via paths of length 
at most two. From this we calculate the second-order reach centrality of node a in Gi as 
CoC^(a, s = 2) = 3. Applying the same arguments to the example network G 2 and the 
corresponding second-order aggregate network shown in Fig. |2(b)| for the same three nodes 
we find the following second-order distances: 

dist^(a, c) = 1 
dist^(a, e) = 2 
dist^(a, d) = 00 

We thus obtain a second-order reach centrality of 

CoC^(a, s = 2) = 2 which corresponds to the temporal reach centrality of node a in G 2 . 

In the following, we use the temporal reach centrality defined above as ground truth, 
while studying how well it can be approximated by first-order and second-order reach cen¬ 
tralities calculated from the first- and second-order time-aggregated networks respectively. 
Different from the analyses for betweenness and closeness centralities, here we must addi¬ 
tionally account for the fact that the reach centrality can be calculated for different values of 
the maximum path length s. This implies that the Pearson correlation coefficient p and the 
Kendall-Tau rank correlation coefficient r must be calculated for each value of s individually. 
The results of this analysis are shown in Fig. [4] which shows the obtained values for p and 
r for the correlations between i) the temporal and the first-order reach centralities (black 
lines), and ii) the temporal and the second-order reach centralities (orange lines) for each of 
the six data sets introduced above. Thanks to our choice of the maximum time difference 
S, for all of our data sets both the underlying first- and second-order networks are strongly 
connected. Assuming that D is the diameter of the corresponding aggregate network, for all 
s > D we thus necessarily arrive at a situation where the reach centralities of all nodes are 
identical. For the results in Fig. [4] this implies that for any s > D the correlation values are 
undefined since the first- (or second-)order centralities of all nodes are the same. We thus 
only plot the correlation coefficients r and p for s < D, in which case they are well-defined. 

For s = 1, the only time-respecting paths considered consist of single links, and thus 
the temporal reach centralities by definition exactly correspond to the reach centralities 
calculated from the first- and second-order topologies. Consequently, for s = 1 we have r = 1 
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(a) 


(b) 


(c) 



Figure 4: Pearson p and Kendall r correlation coefficients between the temporal and the first- 
order reach centralities (black lines) and the temporal and the second-order reach centralities 
(orange lines) for (a) the Ants data set, (b) the E-Mail data set, (c) the Hospital data set, (d) 
the Reality Mining dataset, (e) the Flights data set, and (f) the London Tube data set. Inset: 
zoom to the area where there is a small deviation between values for the case of the London 
Tube data set. 

and p = 1 both for the first- and the second-order reach centrality. For s = 2 there is, again 
by definition, no difference between the temporal and the second-order reach centralities 
however the correlation values for the first-order reach centrality decreases since the first- 
order aggregate network does not accurately represent the structure of time-respecting paths 
of length two. For values s > 2, p and r decrease both for the first and the second-order 
centralities since neither representation can accurately represent time-respecting paths with 
lengths s > 2. However the results also highlight the important fact that second-order reach 
centralities better approximate temporal reach centralities for all values of s > 2. 

We conclude this section by providing detailed results for the specific value of s = 3. The 
choice of a parameter s > 2 means that for the second-order reach centrality we will not 
trivially obtain correlation values of 1 because we would only consider time-respecting paths 
of length two which are captured in the second-order aggregate network. However, since the 
diameter of the first-order aggregate network for two of our systems (RM and HO) is equal to 
three, we can only report results on the correlations between the temporal and the first-order 
reach centralities for four data sets. The results for the first-order reach centrality with s = 3 
are shown in in Table [3] 

Remarkably, for the (LT) data sets we observe a perfect correlation with the temporal 
reach centrality, which means that for this data set reach centralities are seemingly not 
affected by the temporal characteristics of the system. This is different for (FL), for which 
we observe a small Pearson correlation of p = 0.41, with an associated r = 0.27. These 
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CoC temp 

Pearson 

- CoC^ 
Kendall-Tau 

CoC temp 

Pearson 

- CoC^ 
Kendall-Tau 

London Tube (LT) 
Ants (AN) 

E-Mail (EM) 
RealityMining (RM) 
Hospital (HO) 

Flights (FL) 

1.00 (4.65e-168) 
0.72 (8.23e-ll) 
0.61 (3.17e-ll) 
NA 

NA 

0.41 (4.68e-06) 

1.00 (9.32e-64) 
0.59 (1.38e-ll) 
0.60 (3.55e-18) 
NA 

NA 

0.27 (1.46e-05) 

1.00 (1.92e-173) 
0.96 (9.50e-36) 
0.94 (2.74e-44) 
0.68 (3.76e-09) 
0.80 (7.95e-13) 
0.62 (1.44e-13) 

1.00 (7.00e-64) 
0.86 (6.40e-23) 
0.81 (1.52e-31) 
0.66 (2.78e-13) 
0.74 (7.23e-15) 
0.73 (6.53e-31) 


Table 3: Pearson and Kendall-Tau rank correlation coefficients between temporal reach central¬ 
ity (ground truth) and reach centrality for s = 3 calculated based on the first-order aggregate 
network and the second-order aggregate network. Values in parentheses indicate the p-value. 


results show that, for the (FL) data set, temporal characteristics of the data do not allow 
temporal reach centralities to be approximated based on the first-order aggregate network. 
For the second-order reach centralities shown in the right columns of Table [3] we observe a 
significant increase in both the Pearson and the Kendall-Tau correlation coefficients for all 
of the data sets, except for (LT). The largest increase of the Pearson correlation coefficient 
is again obtained for (EM), increasing from 0.61 to 0.94 with an associated increase of the 
Kendall-Tau correlation coefficient from 0.60 to 0.81. We thus conclude that again, second- 
order reach centralities better capture the true (temporal) importance of nodes than a simple 
first-order approximation. 


5 Conclusion 

In summary, we have introduced a framework for the analysis of path-based notions of node 
centralities in temporal networks. In particular, we defined temporal versions of three path- 
based centrality measures which highlight the influence of the temporal-topological dimension 
introduced by the specific timing and ordering of time-stamped links in temporal networks. 
Using six data sets on real-world temporal networks, we have studied to what extent static 
notions of betweenness, closeness and reach centrality differ from their temporal counterparts. 
While for some data sets node centralities in the (first-order) time-aggregated, static network 
can be used as reasonable proxies for temporal centralities, our results show that for other 
data sets this is not the case. Here we found that an analysis of time-aggregated static 
networks that neglect the time dimensions can yield misleading results about the importance 
of nodes. 

In order to overcome these limitations, we have further introduced higher-order aggregate 
networks, a simple yet powerful generalization of the commonly used time-aggregated static 
perspective on time-stamped network data. The basic idea of this construction is that a A'-tli 
order aggregate network captures the statistics of time-respecting paths of length k, thus 
facilitating a higher-order analysis that incorporates both the topology and the ordering 
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of links in temporal networks. We demonstrate the power of this framework through the 
definition of second-order centralities which can easily be calculated based on shortest paths 
in a second-order aggregate network. Despite the fact that these centralities can easily be 
calculated based on a simple static network structure, we find that the resulting second-order 
centrality measures capture better the true temporal centralities of nodes in the underlying 
temporal networks. 

Closing, we would like to highlight a number of open issues which we plan to consider 
in future works. First and foremost, all of our results have been obtained based on simple 
unweighted notions of centralities, even though in principle both the first-and second-order 
aggregate networks allow for the definition of link weights. Hence, our results have been 
obtained based on a rather simple perspective which does not incorporate the full information 
about path statistics preserved by our higher-order aggregate network abstraction. We thus 
expect a future extension to weighted higher-order aggregate networks to capture the true 
temporal centralities of nodes even more closely. Furthermore, while we can in principle 
define higher-order networks of any order k, in our work we have merely studied second- 
order representations and the corresponding generalizations of path-based centralities. Our 
choice to limit our study to an order of k = 2 is mainly due to the amount of available data, 
which for the six temporal networks studied in this work does not allow to obtain meaningful 
statistics for time-respecting paths with larger lengths of size k that are the basis for a £:-th 
order aggregate network. Under what conditions higher-order aggregate networks with orders 
of k > 2 can help us to obtain even better approximations for temporal centralities is thus 
an open question that should be studied in the future. 

Despite these open issues, we consider the fact that the simple second-order central¬ 
ity measures introduced in our work already yield good approximations of the underlying 
temporal centralities a promising aspect of our framework. In this respect, second-order time- 
aggregated representations of temporal networks can be considered a simple, yet powerful 
abstraction for the higher-order analysis of time-stamped network data. 
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